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Summary. We propose a general framework for non-normal multivariate data analysis called multivariate 
covariance generalized linear models (McGLMs), designed to handle multivariate response variables, along 
with a wide range of temporal and spatial correlation structures defined in terms of a covariance link 
function combined with a matrix linear predictor involving known matrices. The method is motivated 
by three data examples that are not easily handled by existing methods. The first example concerns 
multivariate count data, the second involves response variables of mixed types, combined with repeated 
measures and longitudinal structures, and the third involves a spatio-temporal analysis of rainfall data. 
The models take non-normality into account in the conventional way by means of a variance function, 
and the mean structure is modelled by means of a link function and a linear predictor. The models 
are fitted using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating 
functions, using only second-moment assumptions. This provides a unified approach to a wide variety 
of different types of response variables and covariance structures, including multivariate extensions of 
repeated measures, time series, longitudinal, spatial and spatio-temporal structures. 
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1. Introduction 


The analysis of non-normal multivariate data currently involves a choice between a considerable 
array of different modelling frameworks, ranging from, say, generalized estimating equations 
(GEE) and time-series models to generalized linear mixed models and model-based geostatistics. 
Each framework allows the modelling of a specific type of dependence or correlation structure, 
without fitting into any clear overall pattern. Current software implementations have, as we 
shall see below, limited capacity in terms of the complexity and size of data that can be handled. 

This situation stands in sharp contrast to the univariate case, where Nelder and Wedderburn’s 
(1972) generalized linear models (GLMs) provide a unified and versatile approach to regression 
modelling of normal and non-normal data, implemented in an efficient fitting algorithm. A 
further advantage of the GLM approach is that estimation and inference for GLMs require only 
second-moment assumptions. 

In order to obtain a multivariate modelling framework of comparable range and versatility, we 
shall propose the class of multivariate covariance generalized linear models (McGLMs), which, 
following Pourahmadi (1999), are specified via separate link functions and linear predictors 
for the mean vector and covariance matrix, respectively. This allows a unified approach to 
analysis of multivariate correlated data, taking into account response variable of mixed types, 
and allowing a wide range of covariance structures for repeated measures, longitudinal, spatial 
and spatio-temporal data. The models are fitted by means of quasi-likelihood and Pearson 
estimating functions, based on second-moment assumptions, and implemented in an efficient 
Newton scoring algorithm. 

The idea of modelling a function of the covariance matrix by a linear structure goes back 


at least as far as Anderson (1973), followed later by Chiu et al. (1996), who used the matrix 


logarithm as covariance link function. More recently, the idea was extended in several different 
ways by Pourahmadi| (1999 2011), Pan and Mackenzie (2003) and Zhang et al. (2015), among 
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others. These authors consider mainly the multivariate normal distribution, whereas we shall 


use a variance function to take non-normality into account in the style of Liang and Zeger 


(1986). Contrary to the latter authors we shall, however, emphasize the need to model the 
covariance structure explicitly, rather than treating it as a nuisance parameter. 

The availability of standard software is an indicator for which kind of statistical methods 
are in currently use by the statistical and scientific communities. It is hence interesting to 


note that well-established R packages such as lme4 (Bates et al. 2014) and nlme (Pinheiro 


et al. 2013) do not deal with multivariate response variables. In the Bayesian context the 


flexible packages INLA (Rue et al. 2014) and MCMCpack (Martin et al.; 2011) do not deal with 


multivariate response variables, judging from the package documentation. In R, there are at least 
two generalized linear mixed models packages that can deal with multivariate response variables, 
namely MCMCglmm (Hadfield, 2010), which uses Monte Carlo Markov Chain (MCMC) methods 
in the Bayesian framework, and the package SabreR (Crouchley 2012), which uses marginal 
likelihood, but is limited to dealing with at most three response variables. The modelling of the 
covariance structure is currently restricted to making a selecting from a short list of pre-specified 
covariance structures, such as autoregression or compound symmetry. We were not able to find 
any R packages for fitting joint mean-covariance models, not even in the multivariate normal 
case. In SAS the GLIMMIX procedure for generalized linear mixed models (GLMMs) deals with 
multivariate response variables, but is limited to the exponential family of distributions and a few 
pre-determined covariance structures (SAS Institute, 2011). Other software platforms for fitting 


generic random effects models via MCMC, such as JAGS (Plummer. 2003) or WinBUGS (Lunn 


et al., 2000), can deal with multivariate response variables, but carry substantial overheads in 


terms of computational times and convergence checks, while being restricted to a small set of 
pre-specified covariance structures and probability distributions. These limitations on current 
software availability for joint mean-covariance modelling of multivariate response variables may 
reflect either a lack of interest on the part of software users, or a lack of sufficiently flexible 
modelling frameworks. In any case, we will use the latter as motivation for developing the new 
class of McGLMs. 

We now present three correlated data examples along with a short review of currently avail¬ 
able methods for each type of data. The examples were selected in order to highlight some of 
the limitations of current methodology, while illustrating the range of different problems that 
may be handled by the McGLM method. 


1.1. Data set 1: Australian health survey 
The first data set is from the Australian Health Survey for 1987-1988 (Deb and Trivedi, 1997 


Cameron and Trivedi 1998). We selected the following five count response variables for our 


analysis: number of consultations with a doctor or specialist (Ndoc) or with health professionals 
(Nndoc); total number of prescribed and non prescribed medications used in the past two days 
(Nmed); number of nights in a hospital during the most recent admission (Nhosp) and number 
of admissions to a hospital, psychiatric hospital, nursing or convalescence home in the past 12 
months (Nadm). The data set had nine covariates concerning social conditions (see Appendix 
for details). There were 5190 respondents and no missing data. 
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Fig. 1 . Histograms for each outcome variable of the Australian health survey data. 
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This example illustrates the fairly common situation of a multivariate regression problem 
with non-normal (discrete) response variables. The histograms in Figure [l] suggest that the five 
error distributions may not be identical, and hint at potential problems with excess of zeroes 
and under/overdispersion. These problems may, in turn, reflect on the solution to the main 
questions of the analysis, namely assessing the effects of the covariates on each outcome, and 
determining the residual correlation structure. 

Given currently available software, it is a daunting task to select a suitable marginal error 
distribution for each of the five response variables. Besides the classical Poisson and negative 
binomial distributions, other distributions such as the Neyman Type A (Dobbie and Welsh 


2001) or the Poisson-inverse Gaussian (PIG) (Holla 1967) may be relevant. Different distribu¬ 


tions may have to be fitted by separate software packages, each of which comes with its own set 
of problems due to badly behaved likelihood function etc. 

If we decide to use formal methods of model selection, we are faced with the choice of selection 
criterion, such as the Akaike or Bayesian information criterion in the likelihood framework, or 
the deviance information criterion in the Bayesian framework. The Bayesian case involves 
additional work due to the need for choosing suitable prior distributions. These problems 
persist in the special case where all error distributions belong to the same family. One option 


is the multivariate Poisson regression (Tsionas 1999), which is suitable for multivariate count 


data, but is restricted to positive correlations and equidispersed data. A second option is the 
multivariate negative binomial distribution proposed by Shi and Valdez (2014). Such models are 
not easy to fit, and require careful attention to the implementation of algorithms and starting 
values. The assumption of a common error distribution required for these models may, however, 
not be satisfied in practice, and methods for handling the case of unequal marginal distributions 
do not seem to be easily available. 

A different approach for correlated data is the family of generalized linear mixed models 


(GLMM) (Breslow and Clayton; 1993; Fong et ah, 2010), which is based on specifying a GLM 
conditionally on a multivariate latent distribution, often the multivariate normal. A specific 


example of a GLMM for multivariate count data was presented by Rodrigues-Motta et al. 


(2013). GLMMs are computationally demanding, and many different algorithms have been 
proposed in the past three decades, see McCulloch (1997) and Fong et al. (2010) for reviews 
and further references. 

A further aspect of GLMMs that gives rise to concern is the general lack of a closed-form 
expression for the likelihood and the marginal distribution of the data vector. This makes model 
selection even more complicated than for the marginal models discussed above. A related ques¬ 
tion is the special interpretation of parameters inherent from the construction of GLMMs. Thus, 
the covariate effects are conditional on the latent variables, whereas the correlation structure is 
marginal for the latent variables rather than for the response variables. An interesting discussion 
of random-effects and marginal models may be found in Lee and Nelder (2004). 

Additional methods for specifying models for multivariate response variables include the 


copula models ( 

<xupskii and Joe 

2013 

) and the class of hierarchical generalized linear models 

(Lee and Nelder 

1996 

). The fact that several different approaches are available for multivariate 


regression modelling, none of which is particularly easy to use, amplifies our call for a universal 
multivariate modelling framework, preferably one that facilitates model selection and allows 
marginal interpretation of parameters. 


1.2. Data set 2: Respiratory physiotherapy on premature newborns 

We consider some aspects of a prospective study to assess the effect of respiratory physiotherapy 
on the cardiopulmonary function of ventilated preterm newborn infants with birth weight lower 
than 1500 g. The study had three response variables: respiratory rate (RR), heart rate (HR) and 
oxygen saturation (02Sat). The HR and C^Sat data were collected by electronic monitoring and 
RR by means of a stopwatch. Response variables were taken three times: before starting the 
physiotherapy (Evaluation l), immediately after finishing (Evaluation 2), and five minutes 
after finishing the physiotherapy (Evaluation 3). Sixteen newborns were evaluated in consec- 
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Fig. 2. Individual and average (solid line) trajectories by outcome and evaluation for the Respiratory physio¬ 
therapy data. 


utive sessions by two therapists at the neonatal unit. The number of evaluation days varied 
between 2 and 7 days. The data set has 16 covariates concerning health conditions and there 
are 270 cases (see the Appendix). Figure [2] shows the individual and average trajectories by 
outcome and evaluation. 

The main goal of the investigation was to assess the effect of respiratory physiotherapy on the 
outcome variables, while taking into account the effects of covariates and the correlation induced 
by the repeated measures and the longitudinal structures. A special feature of these data is that 
the outcome variables are of mixed types. Thus, the variables HR and RR are continuous, whereas 
the oxygen saturation variable 02Sat takes values in the unit interval, including about 5% exact 
ones, making it hard to propose a suitable probability distribution for this variable. We may, of 


course, use for example the beta (Bonat et al., 2015) or the simplex distribution (Zhang 2014) 


with some ad hoc method for dealing with the exact ones. A better option may be to use the 
beta distribution inflated with ones (jOspina and Ferrari 2010), but this model is complicated to 


fit and interpret. It may hence be preferable in this situation to use a quasi-likelihood method 
based on second-moment assumptions, which is easier to fit and interpret. 

Similar to what we saw in Example 1, the literature may be divided into two main approaches: 


marginal models, mostly based on 

the GEE approach (jO’Brien and Fitzmaurice 

2004, 

Rochon 

1996 

Gray and Brookmeyer 

2000 

), and random-effects models based on GLMMs, see 

Verbeke 


et al. (2014). These authors also provide an extensive review of models for response variables 


of mixed type, whereas Fieuws et al. (2007) reviewed random-effects models for multivariate 
repeated measures. The question of how to model the covariance structure for repeated measures 
and longitudinal data is often solved by choosing from a short list of options, such as compound 
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symmetry, autoregressive, banded and unstructured (Diggle et al. 2002). Such choices are, 


however, not suitable for the combination of repeated measures and longitudinal data found in 
the present data, thereby motivating the development of a more general and flexible approach 
for covariance modelling in multivariate data analysis. 


1.3. Data set 3: Venezuelan rainfall data 

This example concerns monthly rainfall data from 80 stations in the Venezuelan state of Guaarico 
for a period of 16 years (192 months). The data set has 15360 cases with 1092 missing data. 
We also have the spatial coordinates (latitude and longitude) of the 80 stations available, along 


with the covariatc height (height above sea level). The data were previously analyzed by Sanso 


and Guenni (1999) using Bayesian MCMC methods, based on a censored and transformed 


multivariate normal distribution. 

The statistical modelling of rainfall data involves a number of challenges, such as the need 
for simultaneous modelling of seasonal and geographical variation, the complicated nature of 
the spatio-temporal correlation structure, the special form of the marginal distribution (having 
a discrete component at zero), and the possible influence of the sampling scale on the form 
of the analysis (Dunn 2004). The plots shown in Figure [3] illustrate some of these features 


for the Venezuelan rainfall data. In particular, the histogram in panel D highlights the right- 
skewed distribution and the considerable proportion of exact zeroes (around 13%), whereas the 
approximate linearity of the Taylor plot in Panel C suggests a variance function of power form. 
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Fig. 3. Time series plot for Venezuelan rainfall data with fitted values (A). Voronoi tessellation map based 
on spatial coordinates, colored by the average monthly rainfall for the whole period (B). Taylor plot (spatial 
mean and variance in double logarithmic scale) (C). Histogram of monthly rainfall for the whole period (D). 


A simple model for the marginal distribution of total rainfall Y over a certain time period 
is to write Y = Ri + ■ ■ ■ + Rn, where N is the number of rainfall episodes, assumed to be 
Poisson distributed, and the i.i.d. variables Ri are the amounts of rain for each episode, with 
the convention Y = 0 for N = 0, corresponding to a discrete component at zero. A special 
case of this compound Poisson model is the Tweedie family (Jprgensen 1997) (where the Ri 
are gamma distributed), with power variance functions, in agreement with the Taylor plot of 
Figure^ The Tweedie model has been successfully applied to rainfall data by Dunn (2004) and 


Hasan and Dunn (2010, 2012). These authors, however, assume independent data, which is not 


realistic for the present data set. 
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A popular approach for analyzing rainfall data (Chandler and Wheater 2002 Sigrist et al 


2012]) is to use separate models for the discrete component, indicating the number of wet periods, 


and the continuous component, indicating the amount of rain for wet periods (Stern and Coe 


1984 Wilks, 1990). A variety of distributions have been proposed for modelling the continuous 


component of rainfall under the independence assumption, including the log-normal, Weibull, 


generalized log-normal, gamma and mixed gamma distributions (Hasan and Dunn 2010, 2012). 


While these distributions may have their merits for analyzing rainfall data, the above compound 
Poisson model seems more natural, and the Tweedie family is flexible enough to mimic many 
of the shapes of other distributions. 

Turning now to the question of spatio-temporal modelling of rainfall data, one possibility 


is to use models based on marked point processes (Wheater et al. 2000; Cowpertwait et al. 


2006), which may be useful for detailed simulation studies. Another approach is to follow the 


conventional geostatistical paradigm, assuming a parametric covariance function (Diggle and 


Ribeiro 2007). There are several parametric families available for modelling the joint space- 


time covariance structure (Cressie and Huang 1999; Gneiting; |2002 ), although there are issues 
with their interpretability and computational complexity, making it difficult to handle large 
data sets with this approach. 

A different approach to spatio-temporal modelling is to take into account the fundamental 
difference between the spatial and temporal dimensions, the latter obeying a natural ordering 
which is not present in the spatial dimension. It may hence be natural to assume a dynamic 


temporal evolution model in combination with spatially correlated errors, see Sanso and Guenni 
(1999 2004); Sigrist et al. (2012) and the monograph by Cressie and Wikle ( 2011| ). While provid- 
ing a flexible form of spatio-temporal modelling, this method is also computationally demanding, 
and handles response variables with a discrete component at zero by means of a censored mul¬ 
tivariate normal distribution, which does not provide as reasonable an interpretation as the 
Tweedie model. 

A significant simplification may be obtained by assuming that the spatial domain is discrete, 
rather than being continuous as in the last two methodologies discussed above. This approach 
is used for example in disease mapping (Besag et al., 1991), where the covariance structure is 


determined by a neighborhood matrix. This is computationally less demanding, because for a 
given neighborhood structure we may specify the inverse covariance (or precision) matrix. The 
precision matrix, in turn, contains information about the structure of conditional independence 
of the data (Rue and Held; 2005). The proposed simplification may hence be seen as a reasonable 
compromise between model complexity and the capacity to model real data sets, achieveable 
by modelling the covariance structure using a linear combination of neighborhood matrices. 
To accommodate rainfall data, such a modelling strategy should allow for Tweedie distributed 
response variables with power variance functions. Section 2 presents the class of McGLMs, and 
Section 3 considers the Newton scoring algorithm. The three data examples presented here are 
analyzed in Section 4 using McGLMs. The results are discussed in Section 5, including some 
directions for future investigations. 


2. Multivariate covariance generalized linear models 

In this Section we will present the McGLM approach as an extension of GLMs. Let Y be an 
N x 1 response vector, X an N x k design matrix and (3 a k x 1 regression parameter vector. 
A GLM can be written in the following form: 

E(10 = V = g-\X(3) 

Var (Y) = S = V(^;p)5(ro/)V(/x;p)5 (1) 

where g is the link function, V(/x;p) = diag('d(/i;p)), is a diagonal matrix whose main entries 
are given by the variance function i3(-',p) applied elementwise to the vector /x. Finally p and 
To are the power and dispersion parameters, respectively, and I denotes the N x N identity 
matrix. 
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The success enjoyed by the GLM framework comes from its ability to deal with a wide 
range of non-normal data using just two separate functions, namely the link and variance func¬ 
tions. The variance function plays an important role for GLMs, since different choices imply 
different assumptions about the response variable distribution. The power variance functions 
i?(/x;p) = p p are a frequent choice in the GLM framework. It characterizes the Tweedie family 
of distributions, whose most important special cases are the normal (p = 0), Poisson (p = 1), 
Gamma (p = 2) and inverse Gaussian (p = 3) distributions (Jprgensen; 1987, 1997). But in 
spite of its flexibility the GLM approach has some limitations: it deals only with independent 
and univariate response variables, and the variance function is assumed to be known. 

Our main objectives are to extend the GLM approach to deal with first non-independent 
data and second multivariate response variables. A third objective is to estimate the power 
parameter, which works as automatic model selection. 

The Tweedie family is quite flexible for handling continuous response variables, but it is less 
flexible for discrete response variables. Therefore, we propose to use the Poisson-Tweedie family 


to deal with discrete data (El-Shaarawi et al.; 2011). The Poisson-Tweedie family has variance 


function $(/x;p) = p + p p , and many important models for count data are special cases, for 
example the Hermite (p = 0), Neyman Type A (p = 1), negative binomial (p = 2) and Poisson- 
inverse Gaussian (p = 3), 


see 


Jprgensen and Kokonendji (2014). When using the Poisson- 
Tweedie family, the matrix in |lj takes the special form 51 = diag(jLt) + V(jLt;p) = (tqI)'V (p; p)? 
because the dispersion parameter appears only in the second term. Another important case 
is when the response variable is binary, bounded, or the number of successes within a given 
number of trials. In that case the binomial variance function i?(/x) = p(l — p) may be useful. 

It is important to emphasize that by using just these three sets of variance functions we can 
deal with most frequently occurring types of response variables. Such flexibility is very useful, 
for example when analysing data set 1, where the choice of count distribution for each response 
variable is not obvious. Using the Poisson-Tweedie variance function we can deal with zero- 
inflation and overdispersion, such as that observed in data set 1. A similar situation appears 
for data set 2, where we have a bounded response variable with exact ones, which can be well 
modelled using the binomial variance function. The Tweedie family, through its power variance 
function, can model zero-inflated and right skewed response variables, such as the monthly 
rainfall data. 

In Eq. (JT]) it is easy to see where the assumption of independent observations appears in the 
covariance matrix, which in turn suggests how to introduce dependence between observations. 
It is enough to change the identity matrix / to a non-diagonal matrix f2(r). This approach 
is similar to the idea of a working correlation matrix in the Generalized Estimation Equation 


(GEE) fra m ework (Liang and Zeger; 1986; Zeger et al. 1988). Our approach differs from GEE 


in that we propose to model fi(r) in terms of a linear combination of known matrices, following 


the ideas of Anderson (1973) and Pourahmadi (2000), i.e. 


h(ft(r)) = t 0 Z 0 H-b t d Z d . 


( 2 ) 


Here h is the covariance link function , Z^ with d = 0,..., D are known matrices reflecting the 
covariance structure, and r = (to, ... , td) is a (D + 1) x 1 parameter vector. This structure is 
a natural analogue of the linear predictor of the mean structure, and we call it a matrix linear 
predictor. Plugging the matrix linear predictor (|2j) into Eq. ([!]), we obtain a so-called covariance 
generalized linear model. 

Two new issues appear here, concerning how to specify the covariance link function h and 
how to define the matrices Z^. The first issue was discussed by Pinheiro and Bates (1996) and 


Pourahmadi (2011). In this paper we will focus on well-known covariance link functions, such 


as the identity and the inverse functions. In Section [4] we show how to specify the matrices Z^ 
in order to obtain some well-known models for time series, spatial and space-time data. 

Many authors claim that a suitable covariance link function must provide an unrestricted 
and interpretable parametrization. While laudable, such a goal is probably over-optimistic, and 


does not seem to have been achieved yet, at least not for the general case (Pourahmadi 2000 


Pinheiro and Bates 

1996 

). The modified Cholesky decomposition proposed by 

Pourahmadi et al. 
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(2007) presents both features, but is restricted to the case where there is a natural ordering of 
the observations. In general, the identity and inverse covariance link functions allow for simple 
interpretations, but these covariance link functions do not provide unrestricted parametrizations. 
In fact it is quite hard to define the parameter space for r. In Section [3] we propose the so-called 
reciprocal likelihood algorithm where we use a tuning constant in order to control the step length 
of the algorithm and avoid unrealistic values for the parameter vector r. From an algorithmic 
point of view, there is hence no need to require an unrestricted parametrization. 

The second main contribution of this paper is to extend the covariance generalized linear 
model to deal with multivariate response variables. Let Y nxR = { Yi,...,Tr} be a response 
variable matrix and let M/vxij = {/^i, • • • ,^ R } denote the corresponding matrix of expected 
values. To indicate that each response variable Y r has its own covariance matrix we use the 
notation S r = V r (/x r ;p) 2 r 2 7 .(r)V r (/ 2 r .;p) 2 . It is important to emphasize that this matrix models 
the covariances within each response variable. We introduce the Rx R correlation matrix Yf, to 
model the correlation between response variables. To specify the joint covariance matrix for all 
response variables, we adopt the generalized Kronecker product proposed by |Mart'inez-Beneito 


(2013) in the context of multivariate disease mapping. We hence define the McGLM by 


E(Y) = M = {g^ 1 (X 1 f3 1 ),...,g] t \X R p R )} 

Var(Y) = C = 'Er § E b (3) 


G ~ ~ t ~ T 

where Er (g) S 5 = Bdiag(Si,..., E R )(E b <g) J)Bdiag(S 1 ,..., Y R ) is the generalized Kronecker 

product. The matrix E r denotes the lower triangular matrix of the Cholesky decomposition 

of E r . The operator Bdiag denotes a block diagonal matrix and I denotes an R x R identity 

matrix. 


3. Estimation and inference 


In this Section we describe the estimating function approach used to estimate the model pa¬ 
rameters ( |j0rgense n and Knudsen; 2004). We divide the set of parameters into two subsets, 
6 = (/3 t , A 1 ) 1 . In this notation (3 = {fll, ■ ■ ■ ,/3r) T denotes a K x 1 vector containing all 
regression parameters. Similarly, we let A = (pi,..., Pr(r-i)/ 2 ,Pi > ■ ■ ■ :PR> T J> ■ ■ ■ > t r) T be a 
Q x 1 vector of all dispersion parameters. 

To simplify the discussion, let y = (Yj ,..., Y^) T be the NR x 1 stacked vector of the 
response variable matrix Y nxR by columns. Similarly, let A4 = (f^J ,..., f-i R ) T be the NR x 1 
stacked vector of the expected values matrix MjVxR by columns. 

We adopt the following quasi-score function for the regression parameters: 


^((3,\) = D r C~\y-M) 


where D = VpA4 is an NR x K matrix, and Vp denotes the gradient operator. The K x K 
matrix 

S/3 = E(V/3^/3) = -D t C~ 1 D (4) 

is the sensitivity matrix of and the K x K matrix 

V /3 = Var(^p) = D t C~ 1 D (5) 


is the variability matrix of 

Similarly, we adopt the Pearson estimating function, defined by the components 

VW (A A ) = tr (W Xl {r T r-C)) for i = l,...,Q, ( 6 ) 

where W\. = —dC~ l /d\i and r = y — M.. Details on how to compute these weight matrices 
are given in Section [3.1| 
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The entry ( i,j ) of the Q x Q sensitivity matrix of fix is given by, 


Sa„ = E ) = -to (1 Vx,CW x ,C) 


m 


We may show using results about characteristic functions of linear and quadratic forms of non¬ 
normal variables (Knight 1985), that the entry (i,j) of the Q x Q variability matrix of fix is 
given by 

NR 

V Aij = Confix,, fix,) = 2tr( W A . CWx 3 C) + 5>, (4) (W xM^fii, (8) 


l=l 


where k^ 4> denotes the fourth cumulant of 0) to be discussed below, see Eq. (14). To take 
into account the covariance between the vectors (3 and A, we compute the cross-sensitivity and 
-variability matrices. The entry (i,j) of the Q x K cross-sensitivity matrix between (3 and A is 
given by 

SftA, = E ( J-fe) = 0. (9) 


In a similar way the entry (i,j) of the K x Q cross-sensitivity matrix between A and (3 is 
given by 


S ^Ai = E 




( 10 ) 


Va A = E 


( 11 ) 


We can show that the entry (i,j) of the K x Q cross-variability matrix between (3 and A is 
given by 

r NR NR NR 

LEE 

_k=l 1=1 m=1 

where A = D 1 C~ 1 and denotes the j th column of A. In a similar way W^"' 1 denotes the 
lm th entry of the matrix Wxi- Furthermore, the joint sensitivity matrix of fip and fix is given 
by 

S ° \S A/ 3 S A ) ’ 

whose entries are defined by 0, 0 and ( |10[ ). Finally, the joint variability matrix of fip 
and fix is given by 

''•-(£ V? 
whose entries are defined by 0, 0 and ©• 

Let 6 = ((3 , A ) T be the estimating function estimator of 6. Then the asymptotic distri¬ 
bution of 6 is 

e ~ n( 0, j” 1 ) 

where J^ 1 is the inverse of Godambe information matrix, 


Jfl 1 = Si^VflS 


e > 


where S 0 T = (Sg 1 ) 1 ". 


- a _ izu > _ : _ _ 

Jprgensen and Knudsen (2004) proposed the modified chaser algorithm to solve the system 


of equations fip = 0 and fix = 0, defined by 

(3(i+i) = p{i) -S^V^/^aW) 

A( <+ i) = A« -SjV A (/3 (i+1) ,AW). 


( 12 ) 


The modified chaser algorithm uses the insensitivity property 0 , which allows us to use two 


separate equations to update (3 and A. This procedure was implemented in R (R Core Team 
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2015) and some generic functions are made available in the supplement material. The modified 


chaser algorithm is often quite efficient, but it does not have any way to control the step length. 
Thus, based on ideas from Jensen et al. (1991) we propose the reciprocal likelihood algorithm 
involving an additional tuning constant a to control the step length. The reciprocal likelihood 
algorithm replaces the second equation of © by 

A (m) = A «-[aV’A(/3 (m) ,A«) T V; A (/3( i +i),A«) V ^i SA + SA ]-VA(/3( i+ i),A«). (13) 


The strategy for choosing a used in this paper consists of starting the algorithm with a = 0, and 
continuing with a = 0 as long as the proposed value of A* +1 corresponds to a positive-definite 
covariance matrix. In the opposite case, we increase the value of a by a small quantity (e.g. 
e = 0.01) and try again until the covariance matrix becomes positive-definite, after which we 
return to a = 0, corresponding to the modified chaser algorithm. Compared with conventional 
step length methods, our method is adaptive in the sense that directions where the estimating 
function is far from zero are being less penalized. 

To compute the variance of the dispersion parameter estimators we used the empirical fourth 
cumulants, i.e. 

*, (4) = (14) 


The empirical third central moment was computed based on equation ignoring the expec¬ 
tation. The main advantage of using empirical third and fourth moments is that the resulting 
method depends on second-moment assumptions only. The additional variability induced by the 
use of empirical moments implies, however, increased variability of the asymptotic covariance 
of the dispersion parameter estimators, in particular for small sample sizes. 

The Pearson estimating function ([bj) is unbiased only if the vector of regression parameters (3 
is known. Jprgensen and Knudsen (2004) proposed a bias-correction for the Pearson estimating 
function. The 1th bias-correction term is given by 


b\i = -tr (Jp^Jp 1 ), 


(15) 


where Jp denotes the Godambe information matrix for /3 and = dJp/d A*. The corrected 
Pearson estimating function may be solved using the same algorithm as for the Pearson estimat¬ 
ing function. The variability matrix does not depend on the bias-correction term. This is not 
the case for the sensitivity matrix, but the contribution of the correction term to the sensitivity 
is so small that it can be ignored. 


3.1. Derivatives of the covariance matrix 

The key calculation in relation to the fitting algorithm is to compute the derivative of the 
covariance matrix C. In this Section we will provide details of this calculation for the model 
presented in Eq. Let pi for i = 1,.. . ,R(R — l)/2 denote the correlation parameters. We 
use the convention to stack the lower triangle of the correlation matrix by columns. Let p be 
an R x 1 vector of power parameters. Finally, let tr be a D x 1 vector of dispersion parameters. 
To denote a specific element we use the notation r r d for d = 0,..., D and r = 1,..., R. 

The weight matrix is defined by 




dc- 1 = 1 ac 

d\i d\i 


The partial derivative of C with respect to the element pi is given by 


dC 

dpt 


Bdiag(Si,...,S i? ) 



Bdiag(Ei,...,Efl). 


Using elementary matrix calculus the partial derivative of C with respect to the element p r is 
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Table 1. Power and dispersion parameter estimates and standard errors (SE) for the Australian healt 


survey data. 



Ndoc 

Nndoc 

Nmed 

Nhosp 

Nadm 


Estimate SE 

Estimate 

SE 

Estimate SE 

Estimate 

SE 

Estimate 

SE 

p 

1.1414 0.3513 

1.1345 

0.2638 

1.2653 0.2830 

1.6394 

0.1510 

1.7419 

0.5783 

TO 

1.0820 0.4352 

3.3761 

1.2598 

0.3570 0.0646 

18.2059 

3.2348 

1.2065 

0.9371 


given by 


^ j r T' r T' 

= Bdiag(0, • • •, ■ • ■, 0)(S 6 ® J)Bdiag(S 1 


dp r dp r 

+Bdiag(Si,..., E/j)(E & ® I)Bdiag(0,..., 


tm r 

dp r 


, 0 ). 


(16) 


Similar equation may be obtained with respect to the elements in the vector t r . Given the 
block diagonal structure of Eq. (16), it is enough to compute the derivatives dH r /dp r and insert 

(2013) the partial derivative of E r with respect 


in Eq. (|16|). Based on the results from 

dt r 


Sarkka 


to p r and T r d are given by 


and 


dp r 

dt T 




= 1 


dr rd V " dr rd r 

respectively, where the function returns the lower triangular part of the argument and half of 
its diagonal. Now, recalling that S r = V r (/4;p)2fi r (T r )V r (/Lt;p)*, we may hence see that the 
partial derivative with respect to p r and T rd are given by 


<9E r 9V r (/x;p)2 i i dV r (fj.-,p)2 

-S2 r .(r r )V r (/i;p)2 +\ r (pi-,p)^n r {T r ) 


dp r 


dp r 


dp r 


(17) 


and 


<9£ r i dCt r (T r ) i 

= Vr(MP) 2 —^Z - Vr(V,P) 2 , 


rd 


respectively, where 


da r ( Tr ) dh- x {U) 


dT rd 

-B 


Zrd 


(18) 


dr rd dU 

and where U = T r oZ r o + ■ • • + T r £>Z r £>. The derivative in Eqs. © and (fl8|) depends on the 


derivative of the variance function and covariance link function, respectively, and it should be 
evaluated accordingly. 


4. Data analyses 

4.1. Results from data set 1 

In this section we apply the McGLM approach to analyze the multivariate count data set pre¬ 
sented in Section 0 We adopted the log link function, the Poisson-Tweedie variance function 
and the identity covariance link function for the five count response variables. The matrix linear 
predictor is composed of an identity matrix, since we have independent respondents. The lin¬ 
ear predictor is composed of nine covariates plus the intercept for each response variable. The 
covariance structure is described by five power parameters, five dispersions and ten correlation 


parameters. We fitted this model using the modified chaser algorithm (12) and the correction 


term in (15). Table [I] shows the estimates and standard errors for the power and dispersion 
parameters. 

The results in Table [l] show that for the response variables Ndoc, Nndoc and Nmed the sug¬ 
gested distribution is the Neyman Type A (p = 1), which indicates zero inflation relative to 
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the Poisson distribution. Regarding the response variable Nhosp the model indicates that the 
Polya-Aeppli distribution (p = 1.5) is suitable. Finally, the model indicates that for Nadm both 
Neyman Type A, Polya-Aeppli and negative binomial (p = 2) distributions are suitable. This 
result is obtained because the dispersion parameter tq is not different from zero; hence the 
response variable Nadm is equidispersed and all these distributions work well, including the Pois¬ 
son. In this case, we do not have enough information in the data to distinguish between these 
distributions. Therefore, we suggest to opt for the simplest possibility, i.e. the Poisson model. 
The dispersion estimates show weak overdispersion for the response variables Ndoc, Nndoc and 
Nmed and high overdispersion for Nhosp. In order to compare the regression coefficients with a 
conventional model, Figure [4] shows the estimates and confidence intervals for McGLM and a 
conventional Poisson log-linear model for each response variable. The intercept is not shown in 
order to avoid scale issues. 


Poisson —e— McGLM —A- 



P + 1.96SE 


Fig. 4. Regression parameter estimates and 95% confidence intervals by response variable and model for the 
Australian healt survey data. 


The results in Figure [4] show that the two approaches agree in terms of estimates, but differ 
in terms of standard errors. The differences may be explained by the covariance structure. 
The Poisson model assumes equidispersion, whereas the McGLM models allow for a flexible 
modelling of the covariance structure, allowing in particular various degrees of overdispersion 
and zero-inflation. For the response variable Nadm, the model shows that equidispersion is 
suitable, making the McGLM and Poisson confidence intervals similar. On the other hand, 
for the response variable Nhosp, where the overdispersion is strong, the McGLM confidence 
intervals are about five times wider than the Poisson ones. In a similar way, the McGLM 
confidence intervals for Nndoc, Ndoc and Nmed are on average 93%, 38% and 17% wider than 
the corresponding Poisson intervals. These results highlight the importance of modelling the 
covariance structure even when the main interest is in the regression parameters, because the 
covariance structure controls the standard errors for the regression parameters. 

An additional feature of McGLM is that we can estimate the correlation between response 
variables. It is important to emphasize that the estimation of the correlation matrix does not 
inflate the standard errors for the regression coefficients, due to the insensitivity of the quasi¬ 
score function with respect to the covariance parameters. The estimates and standard errors 
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Table 2. Dispersion parameter estimates and standard errors (SE) 
for the Respiratory physiotherapy on premature newborns data. 



RR 


HR 


0 2 Sat 


Estimate 

SE 

Estimate 

SE 

Estimate 

SE 

TO 

134.2669 

24.2386 

95.2933 

32.3082 

0.0129 

0.0840 

n 

40.6929 

11.6902 

82.8065 

15.3124 

0.0010 

0.0051 

T 2 

15.5255 

12.0365 

56.8178 

15.6080 

0.0036 

0.0364 

t 3 

50.2033 

12.6795 

54.2204 

15.5270 

0.0012 

0.0064 

T4 

-18.6080 

19.8331 

39.9998 

24.2276 

-0.0003 

0.0172 

T5 

81.1339 

75.3181 

-143.9326 

91.3125 

-0.0001 

0.0534 

T6 

-63.0717 

57.3648 

105.7698 

68.6823 

-0.0017 

0.0375 


for the entries of the X& matrix were as follows: 


% 


1 

0.1066(0.0161) 

0.1708(0.0156) 

0.0905(0.0164) 

0.1503(0.0160) 


1 

0.0601(0.0144) 

0.0679(0.0156) 

0.0688(0.0147) 


1 

0.0478(0.0144) 1 

0.0699(0.0140) 0.5464(0.0510) 1 


All correlations are significantly different from zero, but only the correlation between Nhosp 
and Nadm is substantial in size. The standard errors are all of a similar magnitude, which is 
natural since all are computed using the same sample size. Furthermore, these correlations take 
into account the effect of all covariates, zero inflation and overdispersion. We know of no other 
statistical method that allows estimation of correlations taking into account all these important 
features. 


4.2. Results from data set 2 

In this section, we apply the McGLM approach to analyze data set 2 from Section 1.2 which 
has response variables of mixed types. There are three response variables, namely HR, RR and 
CLSat, the first two being continuous and the last being confined to the unit interval, having 
exact zeroes. We adopted the constant variance function, identity link function, and identity 
covariance link function for HR and RR, reflecting a belief that HR and RR are normally distributed. 
For 02Sat we adopted the logit link function combined with the binomial variance function and 
identity covariance link function. We fitted the model using the modified chaser algorithm (12) 
and the correction term (15). 

The matrix linear predictor is composed of a diagonal matrix (intercept) combined with two 
sets of matrices to model the longitudinal and repeated measures structures. The longitudinal 
structure is modeled by a compound symmetry matrix (of ones), the reciprocal of Euclidean 
distances and reciprocal of Euclidean distances squared. The repeated measures structure is 
described by an unstructured covariance matrix. Since we have three Evaluations to represent 
this structure we need three matrices. Therefore, the matrix linear predictor is a linear combi¬ 
nation of seven known matrices and it is described by 21 dispersion parameters (seven for each 
outcome). Details of the matrix linear predictor is available in the supplementary material. In 
this example we have no power parameters and the matrix contains three parameters. Table 
[ 2 ] shows the estimates and standard errors for the dispersion parameters. The parameters (ti, 12 
and 73 ) and ( 74,75 and tq) are associated to the repeated measures and longitudinal structures, 
respectively. 

The results in Table [2] show that the longitudinal structure is not significant for all response 
variables. The repeated measures structure is significant for RR and HR. For the outcome RR the 
estimate of 72 is not significant, which means that the covariance between Evaluation 1 and 
Evaluation 3 is not different from zero. For the outcome C^Sat there are no significant disper¬ 
sion coefficients, so we may assume independent observations. The final model is composed of 
the repeated measures structure for the response variables RR and HR and independent structure 
(only tq) for C^Sat. 
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We have a set of 16 covariates entering the linear predictor, and we used a stepwise procedure 
to select the most significant set of covariates. This procedure selected a different set of covariates 
for each outcome. After completing this procedure, we included the covariate of particular 
interest, namely treat, which is a factor with two levels. Our goal is to assess whether or not 
the treatment has an effect on each response variable. 

In order to evaluate the effects of the covariance structure on the regression coefficients, 
Figure [5] shows estimates and confidence intervals obtained from the final McGLM and a quasi 
GLM using the same link and variance functions as for the McGLM. The linear predictors for 
the outcome RR, HR and C^Sat are composed of 10, 13 and 10 regression coefficients, respectively. 
The intercept is not shown, in order to avoid scaling issues. It is important to emphasize that 
the last two regression coefficients (numbered 9-10, 12-13 and 9-10, respectively) measure the 
treatment effects. 


Independent —©— McGLM —A- 



Fig. 5. Regression paratemeter estimates and 95% confidence intervals by outcome and model for the Respi¬ 
ratory physioterapy data set. 

The results in Figure [5] show that in general the confidence intervals from McGLM are wider 
than the corresponding ones based on quasi GLM. For the outcome RR and HR the standard 
errors from McGLM are on average 19% and 27% greater than the corresponding quasi GLM 
ones, respectively. These results are as expected, because correlation within response variables 
generally implies less information in the data on the regression coefficients. It is hence interesting 
to note that, in contras to the other regression coefficients, the two treatment coefficients for each 
response variable have smaller standard errors using McGLM than those obtained using quasi 
GLM. We attribute this effect to the fact that the treatment covariate is also used for modelling 
the covariance structure, which apparently improves on the estimation of the treatment effect, 
although we are uncertain if this is a general feature, or if it is specific to this data set. 

Regarding the outcome 02Sat the standard errors from McGLM are on average 4% smaller 
than those obtained by quasi GLM. This may be explained by the difference between the 
estimates of to, which are 0.0129 and 0.0138 for McGLM and quasi binomial, respectively. 
This small difference seems to be due to the use of the corrected Pearson estimator in our 
model. 

Regarding the treatment effects, the final model shows that for RR the Evaluation 1 differs 
from Evaluation 3, but not from Evaluation 2. On the other hand, for HR the model shows 
that the Evaluation 1 differs from Evaluation 2, but does not differ from Evaluation 3. This 
result contrasts with the quasi GLM analysis, which does not show any significant difference 
between Evaluation 1 and Evaluation 2. Finally, our final model shows that for C^Sat both 
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constrasts are significant. The estimated correlation matrix between response variables was as 
follows: 




1 

0.1682(0.0607) 1 

-0.0482(0.0607) -0.0733(0.0608) 1 


We observe that there is a significant but weak correlation between RR and HR, whereas the 
other two correlation estimates are not significant. 


4.3. Results from data set 3 

In this section we apply the McGLM approach to the space-time data set presented in Section 


We hence adopted the log link function, the power variance function and the inverse covariance 
link function. The linear predictor is expressed in terms of Fourier harmonics (seasonal variation) 
and B-splines (general trend), 


1.3 The response variable monthly rainfall is right-skewed with a positive probability at zero. 


4 

= A) + /3icos(2vrt/12) + /3 2 sin(27rf/12) + ^ /3 ' k+2 B k (j) 

k=i 

where t = 1,..., 192 indexes months, and j = 1,..., 16 indexes years. The B^{j) form a B-spline 
basis with four degree of freedom. We used only the first term of the Fourier harmonics, since 
the second and third terms were not significant. 

The main challenge in the analysis was to model the covariance structure suitably, in order to 
take into account the spatial and temporal autocorrelation and, if necessary, the interaction be¬ 
tween space and time. We propose to model the space-time structure using a linear combination 
of neighborhood matrices. Let us motivate our approach using the Conditional Autoregressive 
(CAR) model. The CAR model specifies the inverse of the covariance matrix by 

= t(D — pW) 

where W is a neighborhood matrix and D is a diagonal matrix with the number of neighbors 
in the main diagonal. The model is parametrized by the precision (r) and autocorrelation (p) 
parameters. The matrices D and W can model both space, time and space-time interaction in 
a straightforward way, by using different neighborhood matrices. 

For the Venezuelan rainfall data, we used the spatial coordinates (latitude and longitude) 
to build a Voronoi tessellation (see Figure [3^3) . The tessellation structure helps us to specify a 
neighborhood structure. Let us denote this structure by D s and W s . Temporal neighbors are 
naturally specified by the time structure. Let us denote these matrices by D t and Wt- In the 
space-time case we have replicates of the space and time structures, so assuming independent 
replicates, the full neighborhood matrix is block-diagonal. Finally, the interaction between 
space and time is described by the Kronecker product between the space and time neighborhood 
structures. Let us denote these matrices by D st = D s <g> D t and W s t = W s <g> Wt- Thus, the 
matrix linear predictor is given by 

ft _1 (r,p) = Tt{Dt + ptW t) + t s (D s + p s W s ) + T st (D st + p s tW s t), 

which may be written as a linear combination of known matrices as follows: 

fi -1 (r) = t 0 Z 0 + r\Z\ + t 2 Z 2 + r 3 Z 3 + 74Z4 + T5Z5, ( 19 ) 

where Zq = D t , Z\ = Wt, Z 2 = D s , Z 3 = W s , Z± = D st and Z 3 = W s t- I n a similar way, 
we find that p t = ti/t 0 , p s = r 3 /r 2 and p st = t 5 /t a . Finally, r t = r 0 , t s = r 2 and T st = r 4 . In 
practical situations, fitting this model may be hard if the autocorrelation parameters are near 
1. In the Bayesian context, it is common to fix the autocorrelation parameter p at the value 1, 
which is the so-called Intrinsic Conditional Autoregressive (ICAR) model. 
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Table 3. Covariance, power and dispersion parameter estimates and standard 
errors (SE) for each model for the Venezuelan rainfall data. 



Model 1 

Model 2 

Model 3 

Model 4 


Estimate 

SE 

Estimate 

SE 

Estimate 

SE 

Estimate 

SE 

p 

1.2731 

0.0380 

1.1241 

0.1926 

1.2340 

0.1066 

1.1443 

0.2475 

% 

1.1505 

0.0349 

- 

- 

- 

- 

0.0339 

0.0074 

n 

0.5285 

0.0268 

- 

- 

- 

- 

0.0264 

0.0051 

t 2 

- 

- 

0.9252 

0.2175 

- 

- 

0.6084 

0.2844 

n 

- 

- 

0.9169 

0.2185 

- 

- 

- 

- 

h 

- 

- 

- 

- 

0.4487 

0.0390 

0.1791 

0.0333 

t 5 

- 

- 

- 

- 

0.3917 

0.0399 

- 

- 

P 

0.4593 

0.1079 

0.9910 

0.0038 

0.8729 

0.0159 

0.7787 

0.0998 


In order to investigate the space-time structure we fitted three models, cf. Table [3j The 
first (Model 1) considers time effects. The second (Model 2) considers space effects and the 
third (Model 3) the space-time interaction effects. After this procedure we decided that the 
autocorrelation parameters associated with space and interaction effects must be fixed at the 
value 1. We then fitted the final model (Model 4) with all three components time, space and 
space-time interaction. All models were fitted using the reciprocal likelihood algorithm (13). 
Table [3] presents estimates and standard errors for the power and dispersion parameters for each 
of the four models. 

The results in Table [3] show a moderate temporal autocorrelation, but high spatial and 
space-time interaction autocorrelations. All the power parameter estimates are in the interval 
1 < p <2, suggesting a compound Poisson distribution, as expected, since the response variable 
is continuous with exact zeros. The fitted values and 95% confidence interval are shown in 
Figure [3]A above. 

The model allows us to make prediction using the Best Linear Unbiased Predictor (BLUP). 
Furthermore, we may obtain predictions for different response variable measures, such as the 
mean number of precipitations events per month, the avarage amount of precipitation per event, 
or the probability of no precipitation. Such extensions are straightforward and will be presented 
elsewhere. 


5. Discussion 

In this paper we have developed a comprehensive statistical modelling framework for correlated 
data, obtained by using separate pairs of link functions and linear predictors for the mean and 
covariance structures in the style of Pourahmadi (2011). Motivated by three data examples, we 
have shown that the McGLM framework can deal with a wide variety of correlation structures 
where existing modelling approaches have difficulties. Following Nelder and Wedderburn (1972), 
there are obvious pedagogical advantages to the modular specification of models in McGLM, in- 
centivating the researcher to think constructively about the covariance structure, while drawing 
on previous experiences from GLM modelling. The generalized Kronecker product facilitates 
the specification of the covariance structure for multivariate response variables. 

The main features of the McGLM framework include the ability to deal with most com¬ 
mon types of response variables, such as continuous, count and binary. Characteristics such as 
symmetry/asymmetry, excess of zeros and overdispersion are easily handled due to the flexibil¬ 
ity of the model class. We can model many different types of dependencies, such as repeated 
measures, longitudinal, time series, spatial and space-time data. All of these features extend 
to multivariate response variables, including the case of mixed response variable types, while 
maintaining the population average interpretations of regression parameters as well as for co- 
variance parameters. This gives a very flexible modelling of the covariance structure compared 
with for example current GEE implementations, where the researcher must choose from a small 
set of pre-defined covariance models. 

The main technical advantage of the McGLM framework is the simplicity of the fitting 
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method, which amounts to finding the root for a set of non-linear equations. Based on second- 
moment assumptions, we use a quasi-score function for the regression parameters and a Pearson 
estimating function for the covariance parameters. The modified chaser algorithm of Jprgensen 
and Knudsen (2004) requires an approximate derivative matrix in the form of the sensitivity 
matrix, which is usually relatively simple. The new reciprocal likelihood algorithm requires the 
additional calculation of the variability matrix in order to stabilize the covariance parameter 
update, resulting in a very efficient algorithm, although the sensitivity matrix may be hard to 
calculate for big data. In such cases a numerical approximation for the sensitivity matrix may 
be used. For both algorithms a careful choice of initial values is required. In any case, we 
avoid using computationally more intensive methods such as MCMC, numerical optimization 
or numerical integration, which are common in the context of random effects models. 

An important feature of the McGLM fitting method is that while the mean parameter es¬ 
timators depend relatively little on the form of the covariance structure, this is not the case 
for the standard errors of the mean parameter estimators, which depend directly on the choice 
of covariance structure. A related matter is that the discussion of the efficiency of the mean 
and covariance parameter estimators is difficult due to the lack of a fully specified probability 
model. 

The current version of the fitting algorithm (available in the supplementary material) is 
a preliminary implementation of the McGLM method. We plan to develop a full McGLM R 
package with a GLM-style interface that takes full advantage of the modular specification of the 
models. There are many possible extensions to the basic model discussed in the present paper, 
including for example facilities for censored data in survival analysis and other special types of 
data, or new estimating functions to handle data not missing at random. The specification of 
the matrix linear predictor is one of the key points of the McGLM approach. While we have 
used some simple and easily interpretable matrices here, there is wide scope for further research 
on the proper choice of the matrix linear predictor. Similarly, other covariance link functions, 
such as the matrix-logarithm Chiu et al. (1996) may also be explored. It is also possible to 
incorporate penalized splines into the mean and covariance structures, and to use regularization 
for high-dimensional data, with important applications in genetics. Furthermore, McGLMs may 


be scaled to test for a common exposure effect in the style of Roy et al. (2003). 


Appendix - Data sets description 

Data set 1: Australian health survey 

Response variables: Ndoc - Number of consultations with a doctor or specialist. Nndoc - 
Number of consultations with health professionals. Nmed - Total number of prescribed and non 
prescribed medications used in the past two days. Nhosp - Number of nights in a hospital during 
the most recent admission. Nadm - Number of admissions to a hospital, psychiatric hospital, 
nursing or convalescence home in the past 12 months. 

Covariates: sex - factor, two levels (0-Male; 1-Female), age - respondent’s age in years divided 
by 100. income - respondent’s annual income in Australian dollars divided by 1000. levyplus - 
factor, two levels (1- if respondent is covered by private health insurance fund for private patients 
in public hospital (with doctor of choice); 0 - otherwise), freepoor - factor, two levels (1 - 
if respondent is covered by government because low income, recent immigrant, unemployed; 
0 - otherwise), freerepa - factor, two levels (1 - if respondent is covered free by government 
because of old-age or disability pension, or because invalid veteran or family of deceased veteran; 
0 - otherwise), illness - number of illnesses in past 2 weeks, with 5 or more weeks coded as 
5. actdays - number of days of reduced activity in the past two weeks due to illness or injury, 
hscore - respondent’s general health questionnaire score using Goldberg’s method; high score 
indicates poor health, chcondl - factor, two levels (1 - if respondent has chronic condition(s) 
but is not limited in activity; 0 - otherwise). chcond2 - factor, two levels (1 if respondent has 
chronic condition(s) and is limited in activity; 0 - otherwise), id - respondent’s index. 

Data set 2: Respiratory physiotherapy on premature newborns 

Response variables: RR - Respiratory rate (continuous). HR - Heart rate (continuous). 02Sat 
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- Oxygen saturation (bounded). 

Covariates: Sex - factor, two levels (Female; Male). GA - Gestational age (weeks). BW - Birth 
weight (mm). APGAR1M - APGAR index in the first minute of life. APGAR5M - APGAR index 
in the fifth minute of life. PRE - factor, two levels (Premature: yes; no). HD - factor, two levels 
(Hansen’s disease, yes; no). SUR - factor, two levels (Surfactant, yes; no). JAU - factor, two 
levels (Jaundice, yes; no). PNE - factor, two levels (Pneumonia, yes; no). PDA - factor, two 
levels (Persistence of ductus arteriosus, yes; no). PPI - factor, two levels (Primary pulmonary 
infection, yes; no). OTHERS - factor, two levels (Other diseases, yes; no). DAYS - Age (days). AUX - 
factor, two levels (Type of respiratory auxiliary, HOOD; OTHERS). TREAT - factor, three levels 
(Respiratory physiotherapy, Evaluation 1; Evaluation 2; Evaluation 3). UNIT - Unit sample 
code. TIME - Days of treatment. 

Data set 3: Venezuelan rainfall data 
Response variable: rainfall - monthly rainfall (mm) 

Covariates: month - month code. Longitude - Longitude (UTM). Latitude - Latitude (UTM). 
height - height above sea level (m). 

Supplement material web page 

http://www.leg.ufpr.br/doku.php/publications:papercompanions:mcglm 
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